Image Captioning Model Using Part-of-Speech Guidance Module for Description With Diverse Vocabulary

نویسندگان

چکیده

Image captions aim to generate human-like sentences that describe the image’s content. Recent developments in deep learning (DL) have made it possible caption images for accurate descriptions and detailed expressions. However, since DL learns relationship between captions, constructs based on commonly frequented words dataset. Although these generated are highly accurate, they low lexical diversity, unlike humans due limited vocabulary. Therefore, this paper, we propose a Part-Of-Speech (POS) guidance module multimodal-based image captioning model determines intensity of word sequences generates through POS enhance diversity DL. The proposed enables rich expression by controlling information predicted predict words. Then, multimodal layer adds output vector Bi-LSTM using next caption, considering grammatical structure. We trained tested Flicker 30K MS COCO datasets compared them with current state-of-the-art studies. Also, analyzed Type-Token Ratio (TTR) confirmed several

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diverse Image Captioning via GroupTalk

Generally speaking, different persons tend to describe images from various aspects due to their individually subjective perception. As a result, generating the appropriate descriptions of images with both diversity and high quality is of great importance. In this paper, we propose a framework called GroupTalk to learn multiple image caption distributions simultaneously and effectively mimic the...

متن کامل

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-tr...

متن کامل

the use of appropriate madm model for ranking the vendors of mci equipments using fuzzy approach

abstract nowadays, the science of decision making has been paid to more attention due to the complexity of the problems of suppliers selection. as known, one of the efficient tools in economic and human resources development is the extension of communication networks in developing countries. so, the proper selection of suppliers of tc equipments is of concern very much. in this study, a ...

15 صفحه اول

Text-Guided Attention Model for Image Captioning

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...

متن کامل

Image Captioning using Visual Attention

This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2022

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2022.3169781